-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] fix gRPC handlers' unlimited active calls configuration #25626
Conversation
src/ray/rpc/grpc_server.cc
Outdated
int buffer_size = 100; | ||
// When there is no max active RPC limit, a call will be added to the completetion | ||
// queue before processing starts, so adding only 1 call is enough. | ||
int buffer_size = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with this. But does this mean we'll accept request slower than before? For example, accept the request 1by1 because the size ie only 1 and only after we do the actual processing, a new one will be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nightly tests do not show a big difference, but from reading the code of completion queue, I can see that having a number of tags buffered can make processing more efficient. Adjusted the value to be closer to the existing value.
* master: (104 commits) [Serve] Java Client API and End to End Tests (ray-project#22726) [Docs] Small fix to AIR examples descriptions (ray-project#26227) [Deployment Graph] Move `Deployment` creation outside to build function (ray-project#26129) [K8s][Ray Operator] Ignore resource requests when detected container resources. (ray-project#26234) Revert "[Core] Add retry exception allowlist for user-defined filteri… (ray-project#26289) [ci] pin gpustat (ray-project#26311) [tune] fix `set_tune_experiment` (ray-project#26298) Revert "Revert "[AIR][Serve] Rename ModelWrapperDeployment -> PredictorDeployment"" (ray-project#26231) [Release] Use nightly base images for release tests (ray-project#25373) Revert "[Core] fix gRPC handlers' unlimited active calls configuration (ray-project#25626)" (ray-project#26202) [RLlib] Some Docs fixes (2). (ray-project#26265) [C++ worker] Refine worker context and more (ray-project#26281) Fix file_system_monitor.cc message (ray-project#26143) [Java] Make Java test more stable (ray-project#26282) [air] Do not warn of `checkpoint_dir` if it's coming from us (base_trainer). (ray-project#26259) [Datasets] Support drop_columns API (ray-project#26200) [Datasets] Fix max number of actors for default actor pool strategy (ray-project#26266) [ci] Stop syncer staging tests (ray-project#26273) [core][gcs] Add storage namespace to redis storage in GCS. (ray-project#25994) [workflow] Deprecate workflow.create (ray-project#26106) ...
Why are these changes needed?
Ray's gRPC server wrapper configures a max active call setting for each handler. When the max active call is
-1
, the handler is supposed to allow handling unlimited number of requests concurrently. However in practice it is often observed that handlers configured with unlimited active calls are still handling at most 100 requests concurrently.This is a result of the existing logic:
ServerCall
object (tag) before actual processing. The problem is that new ServerCalls are created on the eventloop instead of the gRPC server thread. When the event loop runs a callback from the gRPC server, the callback creates a new ServerCall object, and can run the gRPC handler to completion if the handler does not have any async step. So overall, the event loop will not run more callbacks than the initial number ofServerCall
s, which is 100 in the "unlimited" mode.The solution is to create a new
ServerCall
in the gRPC server thread, before sending theServerCall
to the eventloop.Running some night tests to verify the fix does not introduce instabilities: https://buildkite.com/ray-project/release-tests-branch/builds/652
Also, looking into adding gRPC server / client stress tests with large number of concurrent requests.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.